feat: trust console + memory check layer (v0.15.0)#40
Merged
Conversation
Baseline commit so subsequent per-bug fixes have minimal diffs. No behavior changes; just brings these files under version control: - .agent/harness/runtime.py - .agent/harness/control_plane.py - .agent/harness/lesson_store.py - .agent/tools/instances.py
Previously, the second positional arg was assigned to TARGET unconditionally, so the documented form `agentic-stack claude-code --yes` wrote into a literal `--yes/.agent` directory. Now flags are filtered out of positional parsing, TARGET defaults to $PWD when only an adapter and flags are passed, and unknown -flags are rejected rather than silently consumed. Refs: HIGH_PRIORITY_BUG_REPORT.md (P0)
`_parse_args()` previously treated every arg starting with `-` as a flag, silently dropping target paths that begin with `-` and falling back to `os.getcwd()`. Now `--` ends flag parsing, only known flags (--yes/-y/--force/--reconfigure) are consumed as flags, and unrecognized -tokens warn-and-treat-as-path instead of being eaten. Refs: HIGH_PRIORITY_BUG_REPORT.md (P0)
`mark_worker_stopped()` previously left `active_instance` pointing at a stopped instance, so workers that exited via STOP/SIGINT/SIGTERM kept the registry routing future work to a dead instance. Now matches the CLI `stop_instance()` behavior: clears `active_instance` if it points at the instance being stopped, then persists. Refs: HIGH_PRIORITY_BUG_REPORT.md (P0)
Hook only checked `blocked_targets` and the `requires_approval` boolean, ignoring `blocked_patterns` and `requires_approval_patterns` from the shell schema. Now matches command strings against both pattern lists via re.search, blocks bad regex with stderr warnings (fail-soft), and runs before the legacy boolean and permissions.md keyword heuristics. Catches: `curl ... | sh`, `rm -rf /`, `git push --force`, etc. Refs: HIGH_PRIORITY_BUG_REPORT.md (Critical)
No CI previously ran the documented verifier scripts, so high-risk areas could regress without merge-time signal. Workflow runs on push and PR to master with three jobs: - verifiers (ubuntu): test_claude_code_hook.py, verify_codex_fixes.py, verify_instances.py - installer-smoke (ubuntu): exercises both `install.sh claude-code <path> --yes` and the documented no-path form `install.sh claude-code --yes`, asserting no literal `--yes/` directory is created - installer-windows-pwsh (windows): pwsh install.ps1 parity Refs: HIGH_PRIORITY_BUG_REPORT.md (P0)
Windows installer omitted the documented `pi` adapter from the usage comment, ValidAdapters list, and switch cases. Now mirrors install.sh: creates `<TARGET>/.pi/AGENTS.md` only if absent, then wires `.pi/skills` to `.agent/skills` via SymbolicLink, falling back to Junction, then a recursive copy. Safer than install.sh: an existing real `.pi/skills` directory is renamed to a timestamped `.bak-` rather than rm-rf'd. Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
Existing Homebrew test only used the explicit-path form `claude-code <path> --yes`, so it never exercised the broken documented `claude-code --yes` ordering. Pre-creating `testpath/.agent/memory/personal` also masked the install.sh skip-when-exists branch in the .agent copy. Now: removed the pre-creation, asserted `runtime.py` exists after the explicit-path install (full tree copy), then ran `claude-code --yes` inside a fresh subdir asserting no `--yes/` directory was created. Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
Manifest-provided names and precondition paths were joined under SKILLS_DIR / ROOT without containment checks, so a poisoned manifest entry with `../` could probe files outside the skill tree. Adds `_within(root, candidate)` resolve-and-relative-to check, regex validation for skill names, and per-file containment checks before opening SKILL.md / KNOWLEDGE.md. Bad entries warn to stderr and skip rather than crash the loader. Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
Concurrent `start` calls could both observe no live worker and spawn duplicate workers for the same queue (check then subprocess then mark-started left a TOCTOU window). Adds an fcntl exclusive non-blocking lock on `<runtime>/spawn.lock` held across re-check-spawn-mark, so a contended caller bails fast with "another spawn in flight". Liveness now also checks via `os.kill(pid, 0)` so a stale-but-non-None pid triggers respawn. Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
Required and optional sections previously appended unconditionally, with only matched-skills gated by the budget — so an oversized WORKSPACE.md or lessons file would blow past `budget` regardless. Now every append checks `_room()` first. Required sections (role, permissions, paths) are truncated with a marker rather than dropped; optional sections (lessons, episodes, skills) skip with an "[N items omitted]" marker. Reserves a per-required-section header floor so an early section cannot starve later ones. Returns a `_UsedTokens` int subclass exposing `.overflow` while preserving the `(ctx, used)` 2-tuple shape for existing callers. Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
`mark_graduated` / `mark_rejected` / `mark_reopened` joined raw
`candidate_id` into paths without sanitization, so an id with `../`
could resolve outside the candidates directory.
Adds module-level `_validate_candidate_id` (regex
`^[a-zA-Z0-9_-]{1,128}$`) called at the top of each lifecycle entry
point, plus `_ensure_within` realpath-containment defense-in-depth
against symlink shenanigans. Non-atomic write fix is a separate commit.
Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
`graduate.py` joined raw `candidate_id` into paths from sys.argv, so a caller could probe candidate-shaped JSON outside the candidate dir. Validates candidate_id at the CLI entry point right after parse_args (rejects with exit code 4) and adds a `_safe_candidate_path` helper that re-validates plus realpath-checks containment under CANDIDATES_DIR. Imports `_validate_candidate_id` from review_state when available, falls back to a local copy with the same regex. Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
Duplicate-detection callers in graduate.py and auto_dream.py read LESSONS.md, which is rendered accepted-only — so a provisional lesson could be re-staged or re-graduated as if novel. Adds `render_dedup_text()` and `_load_all_for_dedup()` that include every lesson regardless of status (annotated with the real status), and points the two prefilter call sites at the new function. The accepted-only `render_visible_lessons_md` is unchanged so agent context keeps the same trust boundary. Refs: HIGH_PRIORITY_BUG_REPORT.md (High)
`_write_entries()` did a direct truncate-and-rewrite on AGENT_LEARNINGS.jsonl, so a crash, disk-full, or concurrent hook append during run_dream_cycle could lose the entire log. Now snapshots prior state to `.bak`, writes to `.tmp`, fsyncs, then `os.replace`s atomically. Cleans up `.tmp` on failure with original file intact. `_load_entries(report_malformed=True)` surfaces bad-line counts via stderr from `run_dream_cycle` so corruption isn't silent. Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
`stage()` deterministically computed the candidate id and wrote a fresh record with `rejection_count: 0` and an empty decisions list — so re-teaching a previously-rejected candidate erased its rejection history and made churn look novel. Now `_find_prior` checks candidates/, candidates/rejected/, and candidates/graduated/. If a non-provisional graduated record exists, re-staging refuses with exit 3. Otherwise the new record preserves `rejection_count`, `staged_at`, and the prior decisions list, appending a fresh `staged` or `re-staged` entry. The old rejected copy is removed once the new staged file lands so the candidate lives in exactly one location. Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
Auto-promoted candidates were written via direct `open(path, "w")`, so an interruption mid-write left a partial file that the listing loop silently skipped. Adds `_atomic_write_json()` helper using `open(path+".tmp","w")` -> flush -> fsync -> `os.replace(tmp, path)`, with a try/except cleanup of the temp file on failure. The single existing JSON write at line 188 now goes through it. Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
`render_lessons()` acquired the lock via `_locked_jsonl()` then called `load_lessons()`, which opened the same path on a separate UNLOCKED descriptor — so concurrent appends could produce torn reads despite the comment claiming the read-render-write cycle was locked. `load_lessons()` now accepts an optional keyword-only `fp=` argument that reads through a caller-provided locked descriptor. `render_lessons()` binds the locked fp from `_locked_jsonl()` and passes it in. Existing positional callers are unaffected. Refs: HIGH_PRIORITY_BUG_REPORT.md (High)
`post_execution.py` and `on_failure.py` both did raw `open(...,"a").write(json.dumps(...))` into AGENT_LEARNINGS.jsonl, so parallel hook invocations could interleave writes. The dream-cycle rewrite path also raced. Adds `append_episodic_entry()` to `_provenance.py` that takes `fcntl.flock(LOCK_EX)` on the open fd, writes the JSON line, flushes + fsyncs, releases on context exit. Both hooks now go through it. Documents the residual locking-model gap with auto_dream.py's atomic rename rewrite (different mechanisms; worst case is a single lost entry written between snapshot read and rename — acceptable). Refs: HIGH_PRIORITY_BUG_REPORT.md (P1)
`claim_next_job()` removed jobs whose JSON failed to parse, silently losing partial writes or manually corrupted entries with no diagnostic artifact. Now moves the file from `running/` to `failed/<job>.json` via `os.replace` (atomic) and writes a `<job>.json.error.json` sidecar containing the parse error, UTC ISO timestamp, and the original queued/ path. Stderr warning emitted so callers/operators can find the quarantine. Refs: HIGH_PRIORITY_BUG_REPORT.md (P2)
- adapter installed: require all listed files (was passing if any existed; caused false positives in `verify` for opencode/pi/etc.) - doctor --json: preserve non-zero exit code on failed checks (JSON path was always returning 0, masking failures in CI) - tui glyphs: swap PASS/WARN/FAIL text labels for ✓/!/✗ glyphs in curses + plain modes; encoding-aware fallback to +/!/x on non-UTF-8 terminals (PYTHONIOENCODING=ascii, LANG=C) - gitignore: exclude `.agent/memory/**/*.bak` runtime backups - tests: 6 new regression checks (27/27 passing) covering opencode partial install, hermes single-file, doctor --json broken-project exit code, and glyph fallback across encodings
Integrates 72 commits from master (v0.13.0..v0.14.0 + post-tag work) into the trust console branch, then resolves 11 file conflicts. Resolutions: - install.sh / install.ps1: took master (rewrote to thin Python dispatcher; feature's bash flag-parsing fixes are obsoleted). - Formula/agentic-stack.rb: combined master's harness_manager+scripts+ transfer test with feature's agentic_stack_cli.py + runtime.py + no-path test. Wrapper still delegates to install.sh; trust console CLI is installed alongside but not the bin entrypoint (follow-up: integrate trust commands into harness_manager.cli). - README.md, CHANGELOG.md: combined entries from both sides. - .gitignore: combined; .bak exclusion added under master's structure. - .agent/tools/learn.py: kept feature's prior-record merge logic, adopted master's UTC timestamp. - .agent/tools/skill_loader.py: kept both feature's _SAFE_NAME_RE containment check and master's skill_enabled() guard. - .agent/harness/hooks/on_failure.py, post_execution.py: took master (uses _episodic_io.append_jsonl; feature's _provenance.append_episodic_entry is now redundant). - .agent/memory/auto_dream.py: took master (flock-based atomic writes supersede feature's tempfile+.bak approach). Verified post-merge: 27/27 regression checks pass; doctor, tui --plain, verify all exit 0. Known follow-ups (defer to post-merge): - Formula version/sha bump for v0.15.0 release tag (P1 from codex review). - Wire trust console commands into harness_manager.cli or update bin wrapper so `agentic-stack doctor` resolves to the trust console CLI.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
agentic-stack doctor,tui,memory ...,verify,team ...commands inspect the file-backed.agent/data layer with no daemon. Same normalized data model powers text, JSON, and a read-only stdlib curses TUI.✓ / ! / ✗) replacePASS/WARN/FAILin the TUI; encoding-aware fallback to+ / ! / xunder non-UTF-8 stdout.doctor --jsonpreserves non-zero exit on failure, glyph fallback for ASCII terminals.verify_trust_console.py).bddc63b.Test plan
python3 verify_trust_console.py— 27/27 passpython3 agentic_stack_cli.py doctor— exit 0python3 agentic_stack_cli.py tui --plain— renders glyphspython3 agentic_stack_cli.py verify --all --json— exit 0install claude-code <tmp> --yes+verify claude-code— all six conformance dimensions pass